CASIA-BiRViT1K: Bilingual Road scene Video Text Dataset

1. Introduction

The Bilingual Road scene Video Text Dataset (BiRViT1K) was constructed by the National Laboratory of Pattern Recognition (NLPR), Institute of Automation of Chinese Academy of Sciences (CASIA). It contains 1000 videos, including 300 Chinese videos, 300 English videos and 400 bilingual videos. We annotate a total of 64,001 frames with 806,011 text instances in line-level, and every text instance is labeled with a quadrilateral, a transcript and a tracking identification (ID). We randomly select 70% of the videos of each type as the training set and the rest as the test set, so the training set contains 44,808 frames from 700 videos and the test set contains 19,193 frames from 300 videos. Fig. 1 shows some images of different scenes in this dataset.

Fig. 1 Some images of different scenes in the BiRViT1K.s

CASIA-BiRViT1K_part01.rar

CASIA-BiRViT1K_part02.rar

CASIA-BiRViT1K_part03.rar

CASIA-BiRViT1K_part04.rar

CASIA-BiRViT1K_part05.rar

CASIA-BiRViT1K_part06.rar

CASIA-BiRViT1K_part07.rar

CASIA-BiRViT1K_part08.rar


2. Annotations

We annotate the text instances in videos, including Chinese, English, Arabic numerals, common symbols (e.g. commas, periods and spaces). And in this dataset, we use the quadrilateral annotation format. For each text instance, its label includes the coordinates of the four corners of the text box, the transcripts and the tracking identification (ID). If the text instance is less recognizable or most of the area is truncated, we record its transcript as "###". Fig. 2 shows some annotations of video frames. As shown in Fig. 1. and Fig. 2, the scale of the text instances in our dataset is small, and the forms are diverse (license plates, shop names, traffic signs, etc.), which makes it more challenging.

Fig. 2 The annotations of video frames.


3. Dataset Format

We provide two label formats:
(1) A txt file is provided for each image, each line represents a text instance, including corner coordinates, text content and ID, separated by '\t';
"x1,y1,x2,y2,x3,y3,x4,y4 text ID"
(2) A json file is provided for traing set and test set respectively. The format of the annotation file is as follows:

4. Condition of Use

  • The CASIA-BiRViT1K: Bilingual Road scene Video Text Dataset, built by CASIA, are released for academic research free of cost under an agreement.
  • Commercial use of the databases is subject to charge. For possible license of commercial use, please contact Fei Yin (fyin@nlpr.ia.ac.cn).
  • The application form of the dataset for academic research can be downloaded bellowing:


          English version

          Chinese version



    Contact

    Fei Yin (fyin@nlpr.ia.ac.cn)

    National Laboratory of Pattern Recognition (NLPR)

    Institute of Automation of Chinese Academy of Sciences

    95 Zhongguancun East Road, Beijing 100190, P.R. China